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Abstract 

Optimal prediction approximates the average solution of a large system of ordinary 
differential equations by a smaller system. We present how optimal prediction can be 
applied to a typical problem in the field of molecular dynamics, in order to reduce the 
number of particles to be tracked in the computations. We consider a model problem, 
which describes a surface coating process, and show how asymptotic methods can be 
employed to approximate the high dimensional conditional expectations, which arise 
in optimal prediction. The thus derived smaller system is compared to the original 
system in terms of statistical quantities, such as diffusion constants. The comparison 
is carried out by Monte-Carlo simulations, and it is shown under which conditions 
optimal prediction yields a valid approximation to the original system. 

Keywords: Optimal prediction, molecular dynamics, surface coating, hopping, Laplace's method, 
low temperature asymptotics, Monte-Carlo 

1 Introduction 

Computations in the field of molecular dynamics typically require a large computational 
effort due to two factors: 

1. Small time steps are required to resolve the fast atomic oscillations. 

2. Large systems are obtained due to the large amount of atoms which have to be 
computed. 

A wide variety of methods has been developed to remedy these problems. Larger time steps 
are admitted e.g. by smoothing algorithms, which average in time over the fast oscillations. 
Various other methods reduce the degrees of freedom, e.g. multipole methods [14] in the 
context of long range particle interactions. In this paper we investigate whether the 
method of optimal prediction, as presented and analyzed in [1,4-11,15,19], can in principle 
be applied to problems in molecular dynamics in order to reduce the number of atoms to 
be computed. As a first step in this investigation we confine to a one dimensional model 
problem which inherits particular properties from a real molecular dynamics problem. In 
Section [2| we present the real problem as it arises in applications, and derive the simplified 
model problem. The considered problem is Hamiltonian, hence in Section [3] we present 
the method of optimal prediction in the special case of Hamiltonian systems. In Section [4| 
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we apply the method of optimal prediction to the model problem. This yields expressions 
involving high dimensional integrals. We evaluate these integrals by asymptotic methods, 
employing the fact that the process is running at a low temperature, which yields a 
new and smaller system. Section [5] deals with the numerical speed-up. In Section [6] we 
define criteria, how to investigate whether the new system is a valid approximation to the 
original system. We show how important statistical quantities, such as diffusion constants 
and energy fluctuations, can be obtained by numerical experiments. These are presented 
in Section [7l and we investigate under which conditions optimal prediction preserves the 
relevant statistical quantities. 

2 Problem Description 

2.1 The Physical Problem 

In the production of semiconductors a thin layer (a few atomic monolayers) of copper has 
to be coated (sputtered) onto a silicon crystal. A technical description of the process can 
be found in [18]. Important for the quality of the product is that copper atoms must not 
penetrate too deeply into the silicon crystal. The copper atoms penetrate firstly by their 
kinetic energy when hitting the crystal surface, secondly by the process of atomic hopping, 
which will be described in the following. In order to obtain specific knowledge about these 
processes, molecular dynamics simulations have to be carried out, as described in [13]. 

One important aspect of the coating process is that the system is out of its thermo- 
dynamical equilibrium only for very short times, namely for about 10~^^ seconds after 
one copper atom has hit the surface of the silicon crystal. During this time the copper 
atom penetrates into the crystal and sonic waves transport the impact energy away to the 
bottom of the crystal, which is constantly being cooled. Hence, after 10~^^ seconds the 
whole crystal is in equilibrium again. On the other hand, the time between two copper 
atoms hitting the crystal surface is on a time scale of 10~^ seconds, i.e. the system is in 
equilibrium nearly all the time, in particular the temperature is constant with respect to 
space and time. 

However, even in the state of thermodynamical equilibrium single copper atoms can 
change their position in the silicon crystal by hopping events, i.e. a copper atom gains 
by accident enough energy to overcome the potential barrier between two layers in the 
silicon crystal and thus hops to a neighboring cell. By atomic hopping copper atoms can 
penetrate much deeper into the silicon crystal as their impact energy would allow, hence 
the process in equilibrium cannot be omitted from the computation. The average time 
between two hopping events is on a time scale of 10^^*^ seconds, while the fast atomic 
oscillations happen on a time scale of 10~^^ seconds. 

In this paper we show how the method of optimal prediction can be applied to the 
system in equilibrium. Only the atoms at the top layers of the crystal, where copper atoms 
can be found, are computed exactly, while the silicon atoms in the lower layers are kept 
track of only in an average sense. In order to keep the technical difficulties at a minimum, 
we set up a one dimensional model problem which simulates atomic hopping. 

2.2 The Model Problem 

In the model problem, we assume two major simplifications: 
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• Focus on a one dimensional problem, i.e. we consider n atoms lined up like beads on 
a cord. A single copper atom is inserted into a line of n — 1 silicon atoms. 

• The potential V{q) depends only on the pairwise distances of the atoms, i.e. 

n 

V{qi,...,qn)= fo.{qr-qj). (2.1) 

i,j=l 
i<j 

Here a G {1,2}, where /i is the potential between two silicon atoms and /2 is the 
potential between the copper atom and a silicon atom. 




d/A d/A 

Figure 1: Potential /i between two silicon Figure 2: Potential /2 between copper and 
atoms silicon 



Figures [U and [2] show the pair potentials /i and /2. The distance is given in A 
(lA = 10-iOm) and the energy in eV (leV = 1.6 • 10 ^^J). The potentials are chosen 
to be close to the corresponding Lennard-Jones potentials [17] in three space dimensions, 
in particular for /i the position of the minimum (re = 2.24A) and the energy at the 
minimum {E = —D, where D = 3.24eV) are chosen to fit the correct values given in [27]. 
In reality, hopping between two silicon atoms happens many times less likely than a copper 
hopping event. Hence, we neglect silicon hopping in the model problem and choose /i to 
be infinite at the origin. 

On the other hand, the pair potential /2 between copper and silicon is set to be finite 
at distance 0, in order to allow copper hopping. While in a three dimensional crystal a 
copper atom hops through one face of a crystal cell, in one space dimension the copper atom 
can only hop directly over a silicon atom in order to change its position in the crystal. 
Additionally, the value at distance is significantly lowered compared to the potential 
barrier set up by a face of a three dimensional silicon crystal. This makes hopping events 
much more frequent and thus reduces the simulation time required for observing hopping 
events. 

In order to further increase the hopping rate, we increase the temperature significantly. 
The real process is taking place at temperatures around 500K. We run the simulations at 
a temperature of 7000K, which is the maximum temperature that we can still call "low". 



3 OPTIMAL PREDICTION FOR HAMILTONIAN SYSTEMS 



4 



In this context a temperature being "low" means that the the dimensionless quantity 
e = ^jy- is a small number. For real temperatures one obtains e ^ 0.01, for the increased 
temperature e = 0.13. Consequently, any result obtained for increased temperatures can 
be expected to work even better at real temperatures. 

In the following, we will always assume our system to behave as an ergodic system. In 
particular this means that we assume space averages over a number of atoms to be equal 
to time averages over a given time span. 



3 Optimal Prediction for Hamiltonian Systems 

Optimal prediction was introduced by Chorin, Kast and Kupferman [7] as a method 
to apply to underresolved computation, i.e. to problems which are computationally too 
expensive or where not enough data is at hand, but prior statistical information is available. 
Sought is the mean solution of a system, where only part of the initial data is known, 
and the rest is sampled from an underlying measure. While the method is in principle 
not restricted to a particular measure, in equilibrium statistical mechanics one typically 
chooses the grand canonical distribution. Optimal prediction approximates the mean 
solution by a new system which is smaller, and thus cheaper to compute, than the original 
system. In this paper we will use a simple optimal prediction system to consider only 
a smaller number of atoms and "averaging" the other ones away. Note that optimal 
prediction is by no means restricted to Hamiltonian systems, but for such systems it has 
some particularly nice properties. 

Consider a 2n-dimensional Hamiltonian system of ordinary differential equations 

q=— p=-— (3 1) 

dp ^ dq 



with the Hamiltonian function 



Hiq,p) = ^p' + V{q), (3.2) 



representing an n-particle system in one space dimension. In the following, we will consider 
the model problem in equilibrium, hence we assume the position in state space to be 
distributed according to the grand canonical distribution 

/(g,p) = Z-ie-'5^('''P). (3.3) 

Here j3 = is a constant with /c^ being the Boltzmann constant, T the (constant) 

temperature of the process, and Z = J e"^^^^'^) dqdp is a normalization constant. 

We write the solution of (|3.ip as a phase flow, where ip{x, t) = {qi{t),pi{t), . . . , qn{t),Pnit)) 
denotes the solution to the initial condition x = (q^, p^, . . . , q„, p„). Consequently, system 
(|3.ip can be rewritten as 

j^^ix,t) = Riip{x,t)), (3.4) 
ip{x,0) = X. 

Assume that only m of the n atoms are of interest, which yields a separation of the degrees 
of freedom into two groups (p = {(f, (p), where (p = ((/Ji, . . . , (/?2m) represents the atoms of 
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interest, and (f = {(p2m+i, ■ ■ ■ , ^2n) corresponds to the n — m atoms which should be 
considered only in an averaged sense. Let in the following I = n — m denote the number 
of averaged atoms. Typically, m is significantly smaller than n. Note that in our model 
problem silicon atoms cannot hop, hence this separation stays valid over time, given the 
copper atom stays among the silicon atoms of interest. 

Now only part of the initial conditions x = (xi, . . . , X2m), namely the ones corre- 
sponding to the variables which are of interest (p, are known, while the other components 
X = {x2m+i , • • • ) X2n) are not known exactly. Instead, for each choice of x they are sampled 
from the conditioned measure 

U{x) = ZT^-f'^i^'-), (3.5) 

where = J e~^^^^'^^ dx is the appropriate normalization constant. As in [5] we use the 
conditional expectation projection of a function u{x,x) 

„ , 1 r u(x,x)e''^^^^'^^dx 

Pu = E n X = ^ V Lr- ,~ • 3.6 

Optimal prediction as presented in [1,4-9, 11] is interested in the first 2m components of 
the mean solution of (|3.4p , where the initial conditions x are fixed and x are sampled from 

. X r . Ml [ (/j((x,x),t)e-^^(*'^)dx , , 

P^(x,t) = n^{x,m = J . (3.7) 



For linear systems, expression (13. 7p can be computed exactly, and it does not decay. In 
molecular dynamics, however, the Hamiltonian system is in general nonlinear. As observed 
in [1,4], for nonlinear systems, the mean solution decays, which is interpreted as a loss of 
information as the first 2m variables tend to the thermodynamical equilibrium. In [1] the 
authors give a deeper physical reasoning for the decay. An application of the Mori-Zwanzig 
formalism as in [4-6] yields the formal explanation. 

For each choice of x the mean solution (13.7(1 can be approximated by Monte-Carlo 
sampling, i.e. sampling N times x from the conditioned measure (13. 5|) . solving N times 
the system (|3.4I) with initial values (x,x), and averaging over all solutions. Obviously, this 
is more expensive than solving the original system itself. 

In [4] the term first order optimal prediction has been assigned to idea of applying the 
conditional expectation projection P to the right hand side R 

9\ = PR = E[R\x], (3.8) 

which yields a function of just 2m variables. Hence, 9\ = (9^i, . . . , 9^2m) is a function from 
M?"^ to M^"^. The first order optimal prediction system is defined as 

y{t) = m{y{t)), y{0) = x. (3.9) 

An important result, which allows to restrict to considering the Hamiltonian function only, 
is the following 

Theorem 3.1 (O. Hald [10]) // a system is Hamiltonian, then its optimal prediction 
system is also Hamiltonian with the Hamiltonian function 

Sj{q^p) = -hog(- [ [ e-^^^^^P^^'P^dqdp] . (3.10) 



P \c 
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Here c is a constant with unit [c] = [q] ■ [p] = [kg^j . The exact value of c is of no 
importance for the dynamics. 

This theorem imphes that for nonhnear problems first order optimal prediction is 
never a good approximation for long times, since the mean solution decays, i.e. loses 
energy, while the first order optimal prediction system is Hamiltonian, and thus energy 
preserving. In [4-6, 10] higher order optimal prediction methods have been derived, which 
reproduce the desired decay of the mean solution. 

However, in our case, we are not interested in the mean solution, but in a 2m- 
dimensional system, which yields the same behavior of the first m atoms as the full 
2n-dimensional system would have yielded. This does not necessarily require a good 
approximation in the sense of trajectories (which the mean solution focuses on), but the 
relevant statistical quantities should be the recovered. In particular, the 2m-dimensional 
system should be Hamiltonian again. Hence, we choose the first order optimal prediction 
system as the sought 2m-dimensional system. Of course this choice can only be reasoned, 
if it turns out that the relevant statistical quantities are indeed preserved. We will focus 
on this question in Sections [6] and [7l 



4 Optimal Prediction Applied to the Model Problem 

4.1 Appropriate Domains of Integration 

As the pair potentials /i, /2 vanish at infinity, the expression 

/ e-^^(^)dx (4.1) 

is not finite. Hence, the canonical measure f{q,p) = Z~^e~^^^'^'P^ does not make sense 
as a probability distribution if the particle positions are not restricted in some way. In 
many text books on statistical mechanics the whole system is put into a box of finite 
volume, i.e. (gi, . . . , g„) € [—L, L]", where L is suitably large. This essentially means that 
all atoms are trapped, but no ordering is specified. In our model problem, however, the 
silicon atoms are ordered, and since silicon atoms cannot hop over each other, this order 
stays valid over time. This is an information which we do not wish to average out. Hence, 
we restrict the position vector {qi, . . . , q-n) to the domain 

= {(gi, . . . , g„) G [-L, L]"|g2 < • • • < ^n}- (4.2) 

Here qi is the position of the copper atom, which can be anywhere in [— L, L], as it can hop 
freely. The positions of the silicon atoms q2, - ■ ■ ,qn, however, are restricted to be ordered 
from left to right. With respect to optimal prediction, the domain 

M-" = {iqm+i,---,qn) G [-L,L]^\qm < q^+i < ■ ■ ■ < qn} (4.3) 

has to be introduced. For each fixed position vector q = {qi, . . . ,qm) the other I silicon 
atoms q are restricted to be positioned right of the first m atoms, and ordered. This setup 
assumes that the copper always stays among the first m atoms. 
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4.2 The Optimal Prediction Hamiltonian 

When considering the optimal prediction Hamiltonian (I3.10p as given by Theorem 13.11 
one can easily check that the kinetic and potential energy separate 

mP) = log f 4 / e-^^^^'^^ d;^) -i log (1 / e-^^«'^") dq] . (4.4) 



1 



2 

Here Cp and Cq are constants with units [cp] = kg™ and [cg] = m. Since T = ^ Y17=i Tit' 
the first term of (|4.4|) can be computed directly as 

'^(P) = o E - + ^' (4-5) 
2 rrii 

i=l 

where the constant C = — 2^ XliLm+i ( ) °^ relevance for the dynamics. 
Hence, with respect to the momenta applying first order optimal prediction means omitting 
the momenta Pm+i, ■ ■ ■ ,Pn- For the potential V, however, life is far from being as easy 
as for the kinetic energy T, as the g-variables do not separate and are no quadratic 
functions. Thus, an analytic evaluation of the first order optimal prediction potential 
23(g) is in general impossible, or at least too complicated to be of any use. Hence, we 
employ an asymptotic expansion of 2J(g). 

4.3 Low Temperature Asymptotics 

The dimensionless quantity e = = is small for low temperatures. The optimal 
prediction potential expressed in terms of the quantity e is 

^{q) = -De\og\\ [ e^i^(^'«)dg 1 , (4.6) 
YJm^ J 

where V{q,q) = j)V{q,q) is the potential normalized in such a way, that the potential of 
two atoms at equilibrium distance has the value -1. 

Using Laplace's method for integrals of real variables [24,25], one can find an asymptotic 
approximation to (14.61) for e small. In some textbooks this method is also referred to as 
Watson Lemma. Assume for the moment that for a fixed choice of q the function V{q, q) 
has a unique global minimizer r{q) G M' with respect to g, and that the Hessian at this 
point ^^{q, f{Q)) is regular. In the following derivation we use the abbreviatory notations 

r = r{q) and HqV = -Q^{q,r{q)). Laplace's method approximates V{q,q) by a quadratic 
function located at the minimum, yielding the following asymptotic approximation for 

e ^ 



e 

Mf- J Mi' 

q 



e 
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Extending the set of integration to the whole MJ- is valid, since the minimum is always in 
the interior of M-', provided L is large enough (see [25]). Since HgV is assumed to be 
regular, the transformation rule yields 



dq 



\ 



(2vre) 



det c'^HgV 



(4.7) 



Given V is of class (which is the case in our model problem), the complete asymptotic 
expansion including the error term is 



— / e = 

1 



V{q,q) 



dq 



(27re)' 



det c^gHqV 



(4.8) 



which follows directly from the one dimensional shown in [24, pp. 33-34]. Substi- 

tuting (14. Sp into (14. 6p . and employing that log(l + x) ~ x as x — > yields 



^iq) = V{q,r) + C+^log 



det CgHgV 



(4.9) 



where the constant C = — iD/elog (27re) is of no relevance for the dynamics. Hence, we 
found a zeroth and a first order asymptotic expansion in e for 2J, which - returning to 
long notation - are 



53o(9) 
5Ji(g) 



V{q,r{q)), 



D 



y{q,r{q))+e - — \og 



det 



D dq^ 



{q,r{q)) 



(4.10) 
(4.11) 



Note that 23o and 23i approximate 23 only up to constants, which are irrelevant for the 
acting forces. Whenever in the following we speak of 23j approximating 23, we always 
mean: "up to constants". 

The above assumptions, that V{q,q) has a unique global minimizer r{q) with respect 
to q, and that the Hessian at this point ^^{q, r{q)) is regular, can be observed to be guar- 
anteed for our model problem, given one restricts to the domain M~^. Both assumptions 
can be relaxed for the zeroth order expansion, i.e. also in the case of multiple minimizers 
or a singular Hessian the zeroth order expansion stays valid. 



4.4 Zero Temperature Limit 

The zeroth order approximation 23o (|4.10ll is the limit of 23 (|4.6I1 as e — > 0, i.e. T — > 0. 
Hence, we call 23o the zero temperature limit potential. Since the dynamics takes place at 
low temperatures, one can expect the correct optimal prediction potential function 23 to 
be close to the zero temperature limit potential 23o. Hence, we run the low temperature 
optimal prediction dynamics, which would be correctly described by 23, with the zero 
temperature limit dynamics given by 23o. We do not consider the first order approximation 
(|4.11|) here, since the Hessian ^^{q,r{q)) cannot be included in a straightforward manner 
into the following derivation. The results in Subsection [731 however, will show that further 
investigation on the first order expansion could be worthwhile. 
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By going over from 53 to 53o we have formally replaced an /-dimensional integration 
by an Z-dimensional minimization problem. At first glance this is no real improvement, 
since high dimensional global minimization is also computationally expensive (see [20]). 
However, the /-dimensional minimization means nothing else than placing / further atoms, 
such that the total potential energy is minimized. Since 2Jo is formally m-dimensional, we 
call the I new atoms virtual atoms. The restriction to the domain in Subsection 14.11 
guarantees that the / virtual atoms are separated from the m "real" atoms. 



4.5 Equations of Motion 

The zero temperature limit Hamiltonian is 

i3o(9,p) =To(p) +53o(g) = lp^^-^P + V{q,r{q)), (4.12) 

where 971 is a diagonal matrix containing the masses rrii of the atoms (9Jl)jj = m^. In the 
following we assume V{q,r) and r{q) to be of class C^. This allows to compute 

^i^^P) = ^iP)=^-'-P^ (4.13) 
^{<1,P) = ^(^) = |^('^>('^)) + |^(^'-(^))-|(^) = |^(^' (4-14) 

=0 

Note that ^iq,r{q)) is zero, since r(g) is the minimizer of V{q,r{q)). Still, expression 
(|4.14|) involves a minimization problem, in order to place the virtual atoms. We circumvent 

the minimization by deriving equations of motion for the virtual atoms, too. In order to 
obtain the expression ^{q), we define 

dV 

v{q) := ^{q,r{q)). (4.15) 

Since r{q) is always chosen to minimize V, we have that v{q) = Vg. Thus 

dv d'^V d'^V dr 

Solving Km for f|(g) yields an expression which can be substituted into the time evolu- 
tion ^r{q) = f|(^) ■ This yields a closed system for the zero temperature limit optimal 
prediction dynamics 



d . 

^P = -^iq,riq)) (4.17) 



m-^ -p 

_dV_ 
dt^ dq 



d (d'^V d'^V 



J- 



{q,r{q))] • ^T^(g,r(g)) • 9Jl ^ -p, 



dt \ dcp J dqdq 

where initially qi{0) = qi{0) and Pi{0) = Pi{0) and r(0) is chosen such that V{q{0),r{0)) 
is minimal, which can be resolved by a few Newton steps. 
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Since the virtual atoms always follow the minimum of the potential energy, no momenta 
are required to describe their movement. Hence, system (I4.17P is just (n+m)-dimensional, 
instead of 2n-dimensional. It is a closed system of ordinary differential equations, and the 
right hand side requires no integration or minimization. Still, there is an /-dimensional 
linear system of equations to be solved. At this point, we can employ the special structure 
of the potential energy (12. 1|) in our problem. 

Assume that the pair potentials /i, /2 reach over only k (in our model problem k ?a 10) 
equilibrium distances cIq. In the following derivation, we assume the potentials to really 
vanish at greater distances. The derived results hold approximately also for potentials 
which are negligibly small at greater distances. Consequently, only the first k virtual 
atoms rm+i, ■ ■ ■ ,rm+k actually have to be computed. The others will align equidistantly 
right to the first k ones. Since one atom has no influence on atoms more than k equilibrium 
distances away, only / = 2k virtual atoms need to be considered, where the last k ones are 
aligned equidistantly. As we are interested in the case m <^ enough virtual atoms are 
present, such that boundary effects can be neglected. Let the positions of the 2k virtual 
atoms be denoted by 



(4.18) 



where = (r^+i, • • • ,rm+k) and = rm+ke+ {do,2do, . . .,kdo) with e = (1, . . . , 1) 
The time derivative is 

d / 



di^ V ^m+ke ■ ^"^'^^^ 



In this setup the matrices in (|4.17|1 take a special structure: 



The Hessian ^^{q,r{q)) is a diagonal band matrix with band width k. In block 
form, where each block is of size k x k, it is 



where A12 is lower triangular and A21 upper triangular 

dqdq 



The matrix §^{q,^{q)) is upper triangular with width k, i.e. 



{q.r) = f\r^-q^\ ' ' ^ (4.21) 



dqidqra+j else 

In block form it can be written as 

9V f B12 



iQ^rm=( n n I (4-22) 



dqdq \ 

where B12 is an upper triangular k x k matrix. 

Substituting these special vectors and matrices into the equation for the virtual atoms in 
(|4.17l) yields the following equation of motion in block form 



All A12 
A21 A22 
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which imphes the fc-dimensional system for r 

Au-r^ = Bu-Tll^ -Pi. (4.24) 

Here An = An + {A12 ■ e) ■ e^, where Cfc = (0, . . . , 0, 1)^. The momenta of the last k real 
atoms are denoted by pi = {pm-k+i, ■ ■ ■ ,Pm)'^ i and Tli is the diagonal matrix containing 
the corresponding masses. This relation can be interpreted as a boundary layer condition 
which acts as if the crystal of silicon atoms was continued to infinity, although it is actually 
cut off after the m-th atom. 

With the above modifications the zero temperature limit optimal prediction system 
(|4.17|) becomes an explicit (2m + fc)-dimensional system of equations. Hence, one can 
expect this system to yield a computational speed-up, depending on the values of n, m 
and k. The question of speed-up will be considered in Section [5l Due to the various 
approximations in achieving the smaller system it is not at all clear whether the smaller 
system yields the same dynamics as the original system. In Section [6] we will compare the 
new system to the original system and investigate under which conditions the new system 
reflects the "correct" dynamics, and under which conditions it does not. 

5 Computational Speed-Up 

Besides the above physical interpretations, the actual intention was to use optimal pre- 
diction as a method to reduce the computational effort. In this section, we consider both 
the version with I virtual atoms and the boundary layer version. We compare the CPU 
times for computations of the two optimal prediction systems with the CPU time for the 
corresponding computations of the original system. The comparison is carried out in de- 
pendence of the sizes n and m. Since the original version of optimal prediction does only 
replace real atoms by virtual ones, while on the other hand the boundary layer condition 
version allows to really omit atoms, a significant speed-up can only be expected from the 
boundary layer condition version. 

The computations were performed on a network of AMD Athlon-6 1.4 GHz computers. 
The boundary layer condition version was computed with k = 10 virtual atoms. Figure [3] 
shows the speed-up factors with respect to the original system for the two versions of opti- 
mal prediction. The original version of optimal prediction does not yield any acceleration 
(the speed-up factors are less than 1). Apparently, for our model problem setting up and 
solving the linear system in (|4.17p is more expensive than computing the full system of 
equations. The boundary layer condition version, on the other hand, yields significant 
speed-up factors for small m. In principle, one can achieve arbitrarily high speed-up fac- 
tors by keeping m and k fixed and increasing n. In most cases, however, the original 
crystal size n is given a priori, and suitable values for m are given by the requirement 
that the new system must have the same dynamics as the original one, as discussed in 
Section [H 

A choice of m which is reasonable in many cases, is cutting the whole crystal of n atoms 
into two halves, i.e. m = [^J . The speed-up factors for the two optimal prediction systems 
are shown in Figure [H While the original version does not reduce the computational effort, 
the boundary layer condition version yields a good speed-up for larger n. 

Care should be taken when generalizing these results. In our model problem the 
potential functions are fairly cheap to evaluate. In more complicated systems it could 
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Figure 3: Speed-up factors depending on n 
and m. 
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Figure 4: Speed-up factors for n = 2m 



very well pay off to solve the linear system in (|4.17p instead, or to solve the minimization 
problem directly. On the other hand, the boundary layer condition version of optimal 
prediction could possibly fail to work in other applications, e.g. in three space dimensions. 
Such questions will have to be investigated when applying optimal prediction to a new 
problem. 

6 Comparing Two Molecular Dynamics Systems 

In molecular dynamics trajectories in high dimensional phase space are no appropriate 
means for comparing two systems, since initial positions and momenta can never be known 
exactly, and molecular dynamics is typically chaotic. Instead, "comparing" means to test 
whether both systems have similar dynamics. This is represented by statistical quantities 
such as time correlation functions, diffusion constants, fluctuations of energy, etc. We 
consider the following statistical processes in order to compare the two systems: 

• The distribution of the position the copper atom. A copper atom, which is 
initially located always at the same position, is in the ensemble of many experiments 
described by a diffusion process, whose distribution can be approximated by Monte- 
Carlo sampling. 

• The number of hopping events up to a fixed time. 
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• The fluctuation of energy of the first m atoms. The energy of the first m 
atoms fluctuates around some fixed average. We consider the variance of the energy 
over a fixed time interval. 

The first two quantities are related to the diffusion of a single copper atom in the silicon 
crystal due to hopping events. Since copper hopping has been the effect of interest in the 
first place, it is a natural criterion for comparison. 

In molecular dynamics, statistical quantities of ergodic systems can be computed by 
long time averaging or Monte-Carlo sampling. "Long time averaging" means running a 
single computation and using limiting processes to approximate statistical quantities. Im- 
portant examples, e.g. for approximating self- diffusion constants [2], are the Green-Kubo 
formula [17, 23] or the Einstein relation, which both approximate ensemble averages for 
ergodic systems by long time averaging. In our application, however, long-time computa- 
tions are problematic, since firstly the copper atom may travel to the boundaries of the 
silicon crystal and secondly sonic waves will come in effect, as shown in Section [7l Hence, 
we obtain the diffusion constant by Monte-Carlo sampling instead, i.e. we solve the same 
system over and over again with short computation times, where the initial conditions are 
sampled from the canonical measure. In other words: We obtain the averaged quantities 
not by long-time averaging, but by averaging over many samples. 

Sampling both the initial positions qi and momenta pi from the canonical measure 
require expensive methods like acceptance-rejection methods or Metropo- 
lis sampling [12,16] due to the structure of the potential V{q). We circumvent such prob- 
lems by setting the initial positions qi into the potential minimum and sampling the initial 
momenta pi from Z~^e~'^^P\ which is just sampling independently from Gaussian distri- 
butions. In our simulations after about 5 • 10~^^s the Hamiltonian dynamics has driven 
the system into equilibrium. Additionally, keeping the initial positions fixed automatically 
guarantees to remain in the correct domain as given by (14. 2p . 

6.1 A Random Walk Model for the Copper DifFusion 

The considered copper diffusion is due to hopping events, while displacements due to short 
oscillations between the same two silicon atoms are not taken into account. Hence, the cor- 
responding diffusion process is discrete in space on the spatial grid {— i^cZq, . . . , 0, . . . , vdo}. 
Let us assume for the moment, that the diffusion process is linear and hence described by 
a (2z/ -|- l)-dimensional compartment model 

u{t)=A-u{t), u(0) = e™+i = (0,..., 0,1,0,..., 0)^, (6.1) 

where A G m(22^+i)x(2!^+i) j^g^g column sums equal to zero to ensure mass conservation, 
i.e. ^ Sj'"i(i) = 0. The analytical solution to (|6.ip is 

u{t) = exp(t^) • em+i- (6.2) 

One example for such a process is given by the tridiagonal Toeplitz (up to the boundary 

entries) matrix with the stencil (1, —2, 1), which is a finite difference approximation to 

"o 

the heat equation. Hence, the compartment model (|6.ip with this matrix converges to the 
heat equation as do —>■ 0. Still, there are many other matrices A, whose discrete diffusion 
processes all converge to the heat equation as do — > 0. 
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The hopping behavior of the copper atom can be modeled as a specific random walk. 
One important aspect of the copper hopping is that hopping events are correlated, in the 
sense that given the copper atom has just hopped, it is much more likely than normally 
that a second hopping event to the same direction follows, since the kinetic energy is not 
completely lost in one single jump. Hence, the copper diffusion is formally non-Markovian. 
Still, the hopping can be described as a Markovian random walk by assuming hopping 
events to be uncorrelated, but to allow the copper atom to hop over more than one silicon 
atom in one single hopping event. 

These assumptions lead to a model which is typical in the context of stochastic pro- 
cesses, see [22]. Let Xt denote the position of the copper atom at time t. For the n*^ 
hopping event let r„ be the hopping time, ATn = Tn — Tn-i the time since the previous 
hopping event, and A„ G Z \ {0} the number of silicon atoms which the copper atoms 
hops over to the right. Here A„ < means hopping to the left. Assume now that both 
the ATji and the A„ are independent and distributed according to 

P (AT„ G [s, s + ds)) = a ■ exp(— as) ds, (6.3) 
P {\/^n\ = i) = Pi. (6.4) 

Here a is a parameter controlling the hopping rate (see [21,22] for a derivation), and {jpiji^n 
is a non-negative sequence satisfying X^jg^P* ~ \ ^^'^ constant 7 = X^iLi('^o^)^Pi is 
finite. For our model problem we simply assume p = {pi, . . . ,pk)- Additionally, we 
restrict to symmetric random walks, which is reasonable as long as the copper atom does 
not approach the crystal boundaries. 

An analysis of the described random walk, following the analysis in [22], yields that the 
variance of the process at time t equals Var(Xt) = 2jat. A comparison with the variance 
given by the heat equation yields the relation 

K = 7a, (6.5) 

which allows us to speak consistently of a diffusion constant k also for the discrete diffusion 
process (16. ip . As derived in [22] the probability distribution of the random walk Xt is 
described by (16. ip . where the so called infinitesimal generator ^ is a band matrix with the 
stencil a ■ {pk, ■ ■ ■ ,pi, —l,pi, . . . ,Pk)- Define ^ as a corresponding band matrix with the 
stencil 7"^ ■ {pk, ■ ■ ■ ,pi, —l,pi, . . . ,Pk)i hence A = kA. Both matrices have to be changed 
in the upper and lower rows according to the appropriate boundary conditions. 

Assume now, that the values pi, . . . ,pk are known. Hence, the matrix A is completely 
determined. Let v{t) G M^^+^ denote the numerically obtained distribution vectors for 
the position of the copper atom for all times t. Initially v{0) = Cm+i, as in (|6.ip . Since 
it is unclear, whether the diffusion parameter k is constant in time, we let it be a time 
dependent parameter K{t), which we compute at times ti, . . . For obtaining the value 
Kj = n{tj) we consider the diffusion on the time interval Ij = [tj,tj], where tj = tj — ^ 
and tj = tj + Given At is not too large, we can approximately assume the diffusion 
parameter to be constant on the given interval: k{t) = Kjyr £ Ij. On Ij we use the data 
provided by the real evolution v{t) to compute an L^-fit of the diffusion parameter Kj with 
respect to error functional 



(6.6) 
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which is particularly stable with respect to errors in v due to the Monte-Carlo sampling. 
The time At must on the one hand be small enough to justify the approximation that 
the diffusion parameter is constant on the intervals Ij, on the other hand it must be large 
enough to have already some diffusion taken place. In Subsection lT.ll we compute diffusion 
parameters for the numerical data obtained for our model problem. 

7 Numerical Experiments 

We simulate a crystal of n = 70 atoms with 69 silicon atoms and 1 copper atom, which 
starts at the 22"^^^ position. For the optimal prediction system, we choose m = 50. These 
are enough atoms such that averaged quantities make sense and one can speak of a ther- 
modynamical equilibrium. Additionally, the crystal has a reasonable interior region which 
is not affected by any boundary effects. The integration is performed by the classical ex- 
plicit fourth order Runge-Kutta method, which we prefer over energy preserving methods 
in this context, as it allows significantly larger time steps, while still resolving the hopping 
events accurately. The integration time step is At = 2.5 • 10~^^s, and the computation 
time is t* = 4.0 • 10~^^s, which is short enough, such that the change in total energy due 
to the integrator is insignificant. We solve the system N = 25000 times at a temperature 
T = 7000K, with the initial data sampled as described in Section [H For our experiments, 
Monte-Carlo estimates yield an expected error in the distribution of about 4- 10~^, which 
is significantly smaller than the difference of any two quantities in comparison. 

Note that the short computation time t* allows to exclude any effects caused by sonic 
waves traveling through the crystal. The velocity of sound can be estimated as derived in 
[17]: The equilibrium distance inside the silicon crystal is approximately do = 1.87-10~^°m. 
Using the Young modulus Y = do ■ ^ ^JP^ (do) = 5.05 • 10~^^^^ we obtain a velocity of 

sound of c = = 4.50 • 10^2i_ The shortest distance of the copper atom to any 

boundary in the crystal is sieft = q22 — Qi = 2.01 • 10~^m in the original system and 
bright = 950 ~ Q22 = 1-90 • 10~^m in the optimal prediction system. Hence, the minimum 
time a sonic wave takes to travel from a boundary to the copper atom is approximately 
tmin = ^right/c = 4.22- 10~^^s, which is longer than the computation time t* = 4.0- 10~^^s. 
In Subsection 17.41 we will deal with the case when sonic waves are present. 

Figure [5] shows the distribution of the position of the copper atom in the silicon crystal 
when solving the original system. The analogous distribution for the optimal prediction 
system looks indistinguishably similar in this kind of plot. The process is apparently of a 
diffusive nature, but the diffusion parameter changes with time, as already the evolution 
of the maximum indicates. The following analysis will confirm this observation. 

The time-dependent relative error between the distributions for the original and the 
optimal prediction system 

^ max^r \v{x,t) - v{x,t)\ 
maxx \v{x, t)| 

is less than 3% up to the time t = 3.0 ■ lO^^^s and still less than 9% over the complete 
time interval, which is not overwhelmingly small, but does indicate definite similarities 
between the two distributions. 
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Figure 5: Distribution of the copper atom's position 



7.1 DifFusion Parameters 

We compute the diffusion parameters K{t) for the original and the optimal prediction 
system, using the method described in Subsection 16.11 The parameters {pi, ■ ■ ■ ,Pk) of 
the corresponding random walk model are obtained by Monte-Carlo sampling. We use 
the experiments for the original system, which already yielded the distribution shown in 
Figure O In each Monte-Carlo experiment we follow the path of the copper atom and 
cluster consecutive hopping events which happen in an interval of Ati = 6.0 • 10~^^s to a 
single one. Prior to this clustering, fast double hopping events into opposing directions, i.e. 
those happening inside of At2 = 2.0 • 10~^'*s, which are solely due to oscillations of silicon 
atoms, must be excluded. The times Ati and At2 are suitably chosen for our model 
problem. 

The results of the above described Monte-Carlo experiments are shown in Figure [6l 
The hopping probabilities are given by the box histogram. Note that due to Monte-Carlo 
errors and boundary effects the resulting values are not exactly symmetric. Since we 
wish to consider a symmetric random walk, we choose as pi,...,pk the average values 
of the obtained results. Here, only pi,...,pii are greater than zero. Hence, we choose 
k = 11 for the computation of the diffusion parameters. The curve denotes the values i'^pi 
(scaled to fit into the same plot), which are relevant, since the variance Yli=i '^'^Pi = 71 
proportional to k (see relation (|6.5I1 ). 

Figure [7] shows the time-dependent diffusion parameter K{t) for the original system 
and the optimal prediction approximation, computed as described in Subsection 16. 1[ In 
both cases we choose At = 2.5 • 10~^^s. Important observations and estimates are: 

• The diffusion parameters start with a strong peak, and after 1 • 10~^^s they fluctuate 
around a fixed value of k = 4.4 • 10"^''^. This behavior is too pronounced to be 
only due to Monte-Carlo and approximation errors. The initial behavior is most 
likely due to the fact that the system does not exactly start in its thermodynamical 
equilibrium. 
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Figure 7: Diffusion parameters over time 



• Considering that K{t) shows such pronounced behavior, the two functions K{t) for the 
original system, and k{t) for the optimal prediction approximation are remarkably 
close to each other. Up to the time t = lA ■ 10~^^s the the distance between the 
two curves is quite small. After that time the curves differ more, but show the same 
features. After 3.0 • 10~^^s the error takes its maximum, which coincides with the 
errors observed in the distributions shown in Figure [H 

• In order to roughly estimate whether the obtained diffusion parameter is reasonable, 
we compare it with a diffusion coefficient measured in a real material. In [3] the 
diffusion constant of copper in a silicon crystal at a temperature of Tq = 1273K is 
given as = 4.4 • 10"^'^^. Assuming that the diffusion constant depends linearly 
both on temperature and the potential barrier, as e.g. with the Einstein formula [17], 
we obtain an estimate k ^ ''^0'%'^^ = 1.7-10"*^ for T = 7000K, using the values 
A£^o = 3eV and AE = 0.43eV. A look at Figure [7] indicates that our experimentally 
obtained diffusion parameters are indeed in this region. 

7.2 The Number of Hopping Events 

Besides diffusion parameters, also the pure number of hopping events which happen up to 
a given time t* should be preserved by optimal prediction. Figure [8] shows the number of 
hopping events for the above computation plotted as histograms. The solid line represents 
the original system, and the dashed line stands for the optimal prediction system. The 
two graphs differ only for the probabilities of zero and one hopping event. Apart from 
that, one can speak of the same hopping behavior. 
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Figure 8: Number of hopping events Figure 9: Total energy fluctuation 



7.3 Energy Fluctuations 

While the total energy is constant for the original system as well as for the optimal 
prediction approximation, the energy of the first m atoms 

EMtit) = 1^ E ^ + E fMt) - im (7.2) 

1=1 i,]=l 
i<j 

fluctuates over time. This expression fluctuates also for the optimal prediction system, 
since the influence of the virtual atoms is neglected in the potential energy. The fluctu- 
ations in (|7.2p represent the exchange of energy between atoms, which is a quantity that 
should be preserved. Since for optimal prediction we consider the energy of exactly the 
real atoms, this test enlightens the exchange of energy between real and virtual atoms. 
For each Monte-Carlo experiment we consider the variance of (|7.2p over time 

V= / (i?ieft(t)-^left(0))' dt, (7.3) 
Jt=Q 

which measures the impact of fluctuation. Hence, we obtain A'" values Vi, . . . , Vn for both 
the original system and the optimal prediction approximation. 

Figure [9] shows the histogram for the variances Vi for the two systems. The solid line 
stands for the original system, and the dashed line corresponds to optimal prediction. 
The average energy fluctuations for both systems are denoted by the corresponding ver- 
tical lines. While the average fluctuations are very close for the two systems and the two 
distributions look similar in principle, they obviously do not coincide. For optimal pre- 
diction most energy fluctuations are stronger than for the original system. On the other 
hand, particularly high energy fluctuations happen more frequently in the original system. 
Possible physical reasons for this behavior could be: 

• The fact, that in optimal prediction most fluctuations are stronger than in the 
original system might be due to the additional fluctuative Langevin terms, which 
appear in higher order optimal prediction [4]. 
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• The high fluctuations in the original system can happen, since around the m atom 
energy can be exchanged freely among a whole cluster of atoms. In the zeroth 
order approximation to optimal prediction (|4.10|) . on the other hand, there is no 
free energy exchange between the virtual atoms, since they have no momentum, 
but instead always follow the potential minimum. Using the first order asymptotic 
expansion (14. lip might remedy this problem. 

Additionally, at correct temperatures (T = 500K instead of T = 7000K) the energy 
fluctuations in optimal prediction might be much closer to the truth, even for the zeroth 
order approximation. 

7.4 Sonic Waves 

In the above experiments care was taken to exclude sonic waves, which was done by 
keeping the computation time shorter than a wave would take to travel through half the 
crystal. If the computation time is quadrupled and the diffusion distribution analogous 
to the one shown in Figure [5] is computed, one can indeed observe a small antidiffusive 
peak every 5 • 10~^^s, i.e. in many experiments the copper atom is systematically pushed 
back to its starting position. The fact that the above time equals approximately the time 
a sonic waves takes for traveling from the crystal boundaries to the copper atom gives rise 
to the assumption that the observed behavior is indeed due to sonic waves. 

The relevance of sonic waves for optimal prediction can be seen if one lets a sonic 
wave run into the boundary between real and virtual atoms. The wave does not penetrate 
into the block of virtual atoms, but is being reflected by them. Hence, optimal prediction 
simulates a crystal continued to infinity only as long as the system is in thermodynamical 
equilibrium. Non-equilibrium effects, in particular sonic waves, are not reproduced cor- 
rectly by the optimal prediction system in the presented form. Indeed, in the presence of 
sonic waves optimal prediction yields a very different copper diffusion, unlike the experi- 
ments where sonic waves were excluded. Thus the influence of non-equilibrium effects has 
to be considered carefully when optimal prediction is applied to other problems. 

8 Conclusions and Outlook 

In this paper, we applied the method of optimal prediction to a model problem in the 
context of molecular dynamics, focusing on diffusion due to atomic hopping. Employing 
the fact that the temperature of the process is low, asymptotic methods were applied to 
evaluate the conditional expectations which arise in optimal prediction. The zeroth order 
asymptotic expansion was used to derive a new system of equations, which is formally 
smaller. Since in molecular dynamics potentials typically range only over short distances, 
the new system could be cut off after a boundary layer of virtual atoms. This boundary 
layer condition acts as if the crystal was continued to infinity, and yields an obvious 
computational speed-up for our model problem. The asymptotic method itself should 
apply to much more general cases than the specific model problem here. 

In order to investigate whether the thus derived system preserves the statistical behav- 
ior of the original system, various criteria were introduced and checked by Monte-Carlo 
experiments. In particular, the diffusion of a copper atom in the crystal as well as the 
number of hopping events turned out to be preserved well. The exchange of energy at 
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the boundary layer was not preserved that well, but this discrepancy could be explained. 
On the other hand, the new system yielded significantly worse results in the presence of 
non-equilibrium effects, in particular sonic waves, since optimal prediction assumes the 
system to be in equilibrium. 

A natural next step in research on this topic would be to apply the method to a more 
complex problem, possibly in three space dimensions and with focus on further effects 
than atomic hopping. The basic ideas presented in this paper should also apply in three 
space dimensions, but various aspects will become more problematic. On the other hand, 
in three space dimensions sonic waves should dissipate faster, and the fraction of atoms 
which can be averaged out would be larger. Additionally, the relevant statistical quantities 
should be approximated much better at physically correct temperatures. An obvious step 
for deriving a more accurate system is to employ the first order asymptotic expansion, 
which was derived in Section IH Of particular interest in this context is the question, how 
to generalize the method of optimal prediction to problems not in equilibrium. 

We have shown that optimal prediction can in principle be applied to problems in 
molecular dynamics which take place at comparably low temperatures and are in equi- 
librium. The boundary condition version yielded an obvious speed-up. While the pure 
speed-up is not overwhelming by itself, the physical interpretation of the optimal pre- 
diction equations should allow to apply the method in combination with other methods. 
Further investigation may improve the derived results. 
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